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Application  of  Statistical  Estimation  Procedures  to 
the  Identification  Problem 

by 

R.  P.  Wishner  and  J.  C.  Lindenlaub 


ABSTRACT 

The  method  of  maximum  likelihood  parameter  estimation  is  applied 
to  the  problem  of  measuring  parameters  of  an  unknown  linear  filter  or 
control  system  from  input-output  data  when  it  is  assumed  that  the  output 
signal  is  corrupted  with  an  additive  Gaussian  noise  signal.  The  physical 
realization  suggested  by  the  integral  formulation  of  the  estimation  technique 
is  discussed  and  illustrated.  Approximate  expressions  for  the  parameter 
estimates  and  the  covariance  matrix  of  the  errors  in  the  parameter 
estimates  are  obtained  in  the  strong  signal  case.  This  analysis  also 
has  applications  to  the  adaptive  radar  problem. 
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Introduction 

The  communication  engineer  has  exploited  the  use  of  statistical 
parameter  estimation  techniques  for  a  number  of  years,  particularly, 
in  problems  concerning  signal  detection  such  as  those  that  occur  in 
radar.  It  has  occurred  to  the  authors  that  such  techniques  are  useful 
to  the  control  engineer  as  well.  A  natural  application  of  statistical 
estimation  techniques  arises  in  adaptive  control  problems.  The  view¬ 
point  that  an  adaptive  process  incorporates  the  ideas  of  system 
identification,  decision,  and  modification^  places  this  fact  in  evidence; 
the  identification  problem  is  nothing  more  than  a  problem  in  para¬ 
meter  estimation. 

Some  authors  writing  in  the  field  of  identification  seem  to  have 

come  close  to  noting  this  equivalence.  Particular  identification 

(2)  (3) 

schemes  employing  cross  correlation,  matched  filters, 

(4) 

parameter  tracking  models,  etc.  have  been  proposed,  but  we  are 
unaware  of  any  explicit  statements  to  the  effect  that  these  specific 
realizations  were  motivated  by  the  viewpoint  of  statistical  parameter 
estimation.  Once  this  equivalence  between  identification  and  statistical 
parameter  estimation  is  noted  a  large  amount  of  existing  mathematical 
technique  may  be  brought  to  bear  on  the  identification  problem.  The 
purpose  of  this  paper  is,  in  effect,  to  transpose  the  method  of  maxi¬ 
mum  likelihood  parameter  estimation  into  the  language  of  the  control 
engineer.  We  consider  the  problem  of  estimating  the  unknown  para¬ 
meters  of  a  control  system  when  the  signals  are  corrupted  with  noise. 


1.  Cooper,  G.  R.  ,  and  Gibson,  J.  E.  ,  et  al.  ,  "Survey  of  the  Philosophy 
and  State  of  the  Art  of  Adaptive  Systems,  "  Technical  Report  No.  1, 
Contract  AF33{6l6)-6890,  PRF  2358,  Purdue  University,  July,  I960. 

2.  Anderson,  G.  W.,  Aseltine,  J.  A.,  Marcini,  A.  R.  ,  and  Sarture, 

C.  W.,  "A  Self-Adjusting  System  for  Optimum  Dynamic  Performance,  " 
IRE  National  Convention  Record,  pt.  4,  1958. 

3.  Lichtenberger,  W.  W.  ,  "A  Technique  of  Linear  System  Identification 
Using  Correlation  Filters,  11  IRE  Transactions  -  PGAC,  Vol.  AC-6, 

No.  2,  May,  1961. 

4.  Margolis,  M.  ,  and  Leondes,  C.  T.,  "A  Parameter  Tracking  Servo 
for  Adaptive  Control  Systems,"  IRE  Transactions  -  PGAC ,  Vol.  AC-4, 
No.  2,  November,  1959. 
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The  basic  identification  problem  is  illustrated  in  Fig.  1. 

K  g(t;  g)  is  the  impulse  response  of  the  unknown  system;  the  constant 
K  is  a  convenient  scale  factor,  and  the  vector  u  places  in  evidence 
the  dependency  of  the  impulse  response  upon  the  unknown  parameters 
a  =  {ffj,  a It  is  assumed  that  g(t)  is  realizable  and 
that  the  observation  time  of  the  output,  T,  is  choosen  so  that 
g(t)  »  0  for  t  >  T  .  The  output  signal  Ks(t;g)  is  corrupted  with  an 
additive  noise  signal,  n(t)  ,  which  is  assumed  to  be  Gaussian  and 
have  a  known  continuous  autocorrelation  function  R(t,  s)  .  Estimates 
of  the  unknown  parameters  K  and  g  ,  are  to  be  based  upon  measure¬ 
ments,  of  the  output  signal  plus  noise,  y(t)  ,  and  the  input  signal,  x(t)  . 

Maximum  likelihood  estimates  of  the  set  of  unknown  parameters, 
{ }  ,  are  obtained.  The  set  g  which  maximizes  the  conditional 
probability  function  p(yja)  is  known  as  the  maximum  likelihood  esti¬ 
mate  of  g  .  p(yjflt)  is  considered  to  be  a  function  of  the  {a^}  ,  and 
as  such,  is  called  the  likelihood  function.  The  notation  L(a)  will  be 
used  to  emphazize  the  dependence  upon  g  .  Maximum  likelihood 
estimates  have  the  advantage  of  yielding  an  efficient  estimate,  that  is 
one  with  minimum  variance,  if  such  an  estimate  exists.  ^  Excellent 
discussions  of  maximum  likelihood  as  well  as  other  parameter  esti¬ 
mation  techniques  can  be  found  in  references  (5),  (6)  and  (7). 


_ _ 

5.  Cramer,  H. ,  Mathematical  Methods  of  Statistics,  Princeton 
University  Press,  Princeton,  N.  J.,  1946! 

6.  Helstrom,  C.  W.,  Statistical  Theory  of  Signal  Detection, 
Pergamon  Press,  New  York,  N.  Y.  ,  I960. 

i 

7.  Davenport,  W.  B.,  and  Root,  W.  L. ,  An  Introduction  to  the 
Theory  of  Random  Signals  and  Noise,  McGraw-Hill  Book  Company, 

•  Inc.,  New  York,  N.  Y.  ,  19^8. 
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Maximum  Likelihood  Estimates  of  a 

Derivation  of  the  maximum  likelihood  estimates  can  be  obtained 
by  expanding  the  noise  process,  as  well  as  the  other  quantities  of 
interest  x(t),  y(t),  s(t;a)  and  g(t;g)  ,  in  a  series  of  orthogonal 
functions.  Then  n(t),  x(t),  etc.  can  be  represented  by  the  coefficients 
of  this  series.  The  Karhunen-Loeve  expansion^^®^^*^  of  n(t) 
will  be  used  because  the  coefficients  of  this  series  are  uncorrelated. 
Thus,  if  n(t)  has  a  continuous  correlation  function  R(t,s),  then  n(t) 
can  be  represented  over  the  interval  (0,  T)  by  the  series 

N 

n(t)  =  l.i.m.  2  n  0,(t)  (1) 

N  —  «o  k=  1 

Here  the  set  of  functions  {<p,  (t)}  (assumed  to  be  a  complete  set)  are 

*  .  2 

the  eigenfunctions  associated  with  the  eigenvalues.^^}.  of  the  integral 
equation 


rT  2 

J  R{t,s)^(s)ds  =  (Tk  0k(t) 


(2) 


and  the  coefficients  n  (the  observables  of  the  noise  process)  are 

K 


n 


k 


n(t)  4>k( t)  dt 


(3) 


7 .  ibid . 

8.  Loeve,  M.  ,  Probability  Theory,  D.  Van  Nostrand,  Princeton,  N.  J. , 

1955- 

9.  Grenander,  U.  ,  "Stocastic  Processes  and  Statistical  Inference," 
Arkiv  For  Matematik,  1950. 

10.  Kelly,  E.  J.,  Reed,  I.  S.,  and  Root,  W.  L.,  "The  Detection  of 
Radar  Echoes  in  Noise.  I  and  II,"  J.  Soc.  Indust.  Appl.  Math.,  Vol.  8, 
No.  2,  June,  I960,  (I)  and  Vol.  8,  No.  3,  September,  I960,  (II). 
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Hereafter,  all  expansions  similar  to  Eq.  1  should  be  considered  as 
convergent  in  the  mean  and  the  explicit  notation  used  in  Eq.  1  will 
be  dropped.  As  mentioned  above,  the  coefficients  of  the  series  in 

Eq.  1  have  the  properly 


E{nkni)  =  fffc; 


(4) 


so  that  n(t>  may  bo  represented  by  the  sequence  of  uncorrelated 
random  variables  {n^,  k  =  1.  2,  .  .  .}  • 

Expressions  analogous  to  Eqs.  1  and  3  are  assumed  to  exist 
for  x(t),  g(t;a).  s<t.2>.  and  y(t)  .  The  constant  K  ischosen.no 

that 


oo  s  ,  (at) 

E  4-  .1 


(5) 


1  ** 


for  all  3  ■  Note  that  this  normalization  effects  the  meaning  of 
s(t ;a)  and  g(t;a)  .  The  noise,  n(t),  is  assumed  to  be  a  zero  mean 
Gaussian  process.  In  this  case  the  nk’s  are  Gaussian  random 
variables  with  zero  means  and  variances  (TZk  •  Similarly  the  yk's 
will  be  jointly  Gaussian  with  means  Ksk  and  variances  <T  k  .  The 
likelihood  function  for  the  Gaussian  case  is 


L(a)  =  P(y|2)  =  *  (2tt  0"k ) 

k  1 


i-l/2 


exp 


"  (2*  (Tv  ) 

k=l 


-1/2 


exp 


-  2ykKsk(s)  +  K2  sk(arj) 


20" 


The  a  which  maximizes  L{a)  is  desired.  Since  the  first  term  in 
the  exponential  is  not  a  function  of  the  3.  it  may  be  disregarded. 
Also  L(g)  is  maximized  when  In  L(g)  is  a  maximum  so  that  the 


(6) 


■I 

I 

1 


desired  set  of  estimates  is  the  set  that  maximizes  the  expression 
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00 

s 

k  =  1 


(2ykKsk(2)  ■  r2  sk(2)) 


(7) 


2<rt 


Completing  the  square  in  Eq.  7  and  making  use  of  Eq.  5  the  estima¬ 
tion  procedure  can  be  written  as 


Ci/;  yksk(2)V  i  /;  yksk(a>  \2] 

a  K  ^  \k=l  O'2..  /  2  \k=l  O’2  / J 


2 

k 


(8) 


The  maximization  over  K  can  be  done  by  inspection  so  that  the 
estimate  of  K  becomes 


A  = 


max 

a 


where  Y  (gi)  is  defined  implicitly. 

The  signal  K  s(t;a)  is  related  to  the  unknown  system  impulse 
response  and  the  input  signal  by  the  convolution  integral 


K  s(t;g)  =  J  x(\)  K  g(t  -  X;  a)  d\ 


(H) 


and  the  coefficients  s.  (g),  can  be  expressed  as 


oo 

V 

yk  sk(^ 

(9) 

td 

k=l 

< 

o  of 

a  are  then  chosen  to  satisfy 

i 

a 

ykSk(g)V  2. 

k  k  )  -  max  Y  (a) 

(10) 

\k=  1 

<.  /  2 

I 


igPIiPii^lFJP1**!111.1 111  . . . . 


s  k(a)  =  J  x(X.)  g^Xjg)  d\ 


where 


i 

gk  =  r  "  X-:  dt 


Note  that  gk(X.;a)  is  zero  outisde  the  interval  -T  <  \  <  T  .  Using 
these  relations  Y{o;)  of  Eq.  10  can  be  written  as 


Y  (a)  =  S  (T. 

k=l  l  k 


T  «5  "N 

^  y (t)  0k(t)  dt  J  x(\)  gk(X;a)  d\ 


X, 

=  J  dt  y{t)  J  x(\)  g  j  (tj  \;a)  d\ 


where  g^t,  X;a)  is  defined  as 


00  gjJXja)  <Mt) 

g,{t,x;«)=  s  — — -  — 

k=l  OT 

k  . 


0  £  t  £  T  (15) 


It  can  be  shown,  by  direct  substitution,  that  gj(t,  \;a)  satisfies 
the  integral  equation 


J. 

^  R(t  -  s)  g^s,  \;a)  ds  =  g(t  -  a)  |  Og  t  g  T  (16) 


Substituting  Eq.  14  into  Eq.  10  gives  the  following  integral  expression 
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for  finding  the  maximum  likelihood  estimates  of  the  {  }  . 


Physical  Interpretation 

Let  oP ,  at°  and  K°  denote  the  true  value  of  the  parameters 
a,  o-  ,  and  K  .  Maximum  likelihood  estimates  of  the  a °  are 
obtained  by  multiplying  the  output  signal  (plus  noise)  of  the  unknown 
system  by  the  output  of  a  filter  gj(t,  X;tt)  ,  and  integrating  this 
product  over  the  observation  time,  T  •  Both  the  unknown  system 
and  gj  are  subjected  to  the  same  input  signal.  Because  of  the 
delay,  or  memory,  of  g  and  gj  inputs  between  the  times  -T 
and  +T  affect  the  output  signal  during  the  interval  0,T.  (Recall 
that  the  observation  time,  T  ,  was  assumed  to  be  longer  than  the 
significant  duration  of  g(t)  so  that  inputs  prior  to  -  T  have  a 
negligible  affect  upon  the  output  during  0,  T.) 

A  physical  realization  of  this  process  is  illustrated  in  Fig.  2. 
The  outputs  from  a  bank  of  estimating  filters,  each  with  a  different 
set  of  parameters  {  }  ,  are  integrated  and  squared.  The  set  of 

parameters  that  corresponds  to  the  channel  with  maximum  output 
at  time  T  is  then  the  set  of  ^  which  represents  the  maximum 
likelihood  estimates  of  the  true  parameters,  and  the  value  of  the 
signal  at  the  output  of  the  integrator  is  the  estimate  of  K°  • 

For  the  special  case  of  white  noise  R(t  -  s)  =  NQ  6(t  -  s)  so 
that  the  integral  in  Eq.  (16)  becomes  trivial 


gj(t,X.;a)  =  g(t  -  X;  a)  ,  0£t£T  (18) 

o 

In  this  case  gj(t,  \;ff)  is  a  physically  realizable  time  invariant 
filter.  When  n(t)  is  not  white  the  realization  of  g^  is  not  as 


t 


\ 


\ 
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straight  forward.  It  is  possible  to  approximate  by  a  finite 

number  of  terms  of  the  defining  series,  Eq.  15.  The  $.(t)  and 

2  K 
<T£  are  determined  by  the  noise  autocorrelation  function  R{T), 

which  is  assumed  to  be  known,  and  the  g^Xiar)  can  be  evaluated 

from  the  known  form  of  g(t)  using  Eq.  13.  Such  a  realization  is 

shown  in  Fig.  3. 

The  procedure  for  obtaining  estimates  of  the  ar^'s  for  the 
non-white  noise  case  is  as  follows.  At  t  =  -T  the  time  variable 
gians  are  "started",  that  is  the  gains  are  set  equal  to 
(-  T'  <*) 

g,  - — and  X  is  allowed  to  traverse  the  interval  -T,  +T  . 

K  (T2, 
k 

At  t  =  0  the  integrators  in  Fig.  2  are  reset,  and  finally  at  t  =  T 
the  outputs  of  each  channel  are  examined  and  the  channel  with  the 
maximum  signal  is  determined.  It  is  necessary  to  "start"  the 
g^  filter  at  t  =  -  T  to  insure  that  both  g  and  g^  have  corresponding 
initial  conditions  at  t  =  0  . 

In  general  gj(t,X)  will  not  be  realizable.  However,  Eq.  17 
is  equivalent  to 

f  f 

max  <  \  y(t 

"  LT 

It  can  be  shown  that  g^(t  -  T,  X;  a)  is  always  realizable.  Thus,  by 
delaying  y(t)  by  time  T  and  replacing  gj(t,  X;  a)  by  g^(t  -  T,  a) 
a  realizable  system  can  be  obtained. 

Only  the  analytical  form  of  the  product,  K (a)  g(t;a),  has  been 
assumed  to  be  known;  g(t;a)  itself  is  not  known  and  the  multiplying 
constant  depends  upon  both  a  and  x(t)  as  well  as  the  properties 
of  the  noise.  For  any  given  set  of  a  and  x(t)  the  constant  K 
can  be  computed  and  g(t;g)  found.  At  first  sight  this  might  seem 
discouraging  because  it  suggest  that  the  filters  g^(t;ar)  cannot  be 
constructed  until  x(t)  is  known.  This  difficulty  can  be  avoided 
however  by  noting  that  it  is  possible  to  multiply  both  sides  of 


-  T) 


I 


x(X)  gx(t  -  T, 


l2 

X;  a)  dX  dt  V 


,  .(.Vy 


Eq.  16  by  cK(at)  and  solve  for  cK (or)  gj(t,  \",a)  (c  is  an  unknown 
constant*).  Here  it  is  not  possible  to  separate  cK(ar)  from 
g^(t,  \;a)  without  a  knowledge  of  the  input,  but  the  product 
cK (2)  g^t.Xja)  is  independent  of  x(t). 

If  a  bank  of  filters  cK|a)  g  ^ (t ,  \',S)  is  used  in  Fig.  2  the 
outputs  of  each  channel  would  be  c2  K2(a)  Y2(<j)  instead  of  Y2(q>). 
The  desired  outputs  would  be  obtained  by  dividing  each  output  by 
the  appropriate  c2K2(2)  .  Thus  a  scheme  for  computing  c  K(ar) 
is  required.  Since  the  analytical  from  of  K (a)  g(t;jf)  is  known 
cK(s)  can  be  computed  for  any  x(t)  by  observing  cK|ff)  s(t;g) 
(obtained  from  the  output  of  an  appropriate  filter)  and  using  Eq.  5. 
The  maximum  likelihood  estimator  then  takes  the  form  of  Fig.  4. 
For  the  white  noise  case  the  "gain  computer"  is  a  rather  simple 
device  based  on  the  relation 

T 

(cK)2  ^  s2(t;or)dt  =  (cK)2  N0 

o 

The  left  hand  side  of  the  above  equation  is  the  energy  out  of  a  filter 
with  impulse  response  cK(«)  g(t;»)  and  can  be  easily  obtained  by 
a  scheme  such  as  shown  in  Fig.  5.  For  the  non-white  noise  case 
a  spectral  analysis  (with  respect  to  the  orthogonal  'set  .of  functions 
0^(t)  )  of  cK(ff)  s(t;o)  is  required  so  that  the  sum 

2  K2(o)  s'2(t  ;a) 

c  S  - 5 -  can  be  computed  and  c  K(q)  found  so  as  to 

k  <r2 

satisfy  the  normalization  condition  of  Eq.  5.  As  mentioned  above 
in  the  non-white  noise  case  delays  of  time  T  may  be  necessary  to 
insure  realizability. 


*  The  unknown  constant  c  is  introduced  to  emphasize  the  fact  that 
only  the  form  of  K  g(t,\;or)'  is  known,  that  is,  it  is  known  only  to 
a  scale  factor,  1/  c  . 


Single  Channel  of  "Gain  Computer"  for  the  White  Noise  Case 
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Estimation  Errors 

It  is  of  interest  at  this  point  to  consider  the  problem  of  finding 
an  expression  for  the  errors  associated  with  the  system  identification. 

The  errors  will  depend  upon  the  observation  interval  T  so  that  it 
will  be  possible  to  determine  the  observation  time  that  is  required 
for  some  specified  error  variance.  This  figure ,  the  identification 
time,  would  be  important,  for  instance,  in  considering  the  stability 
of  an  adaptive  control  loop. 

Before  becoming  involved  with  the  mathematical  details,  the 
general  philosophy  of  the  approach  will  be  outlined.  The  "solution 
for  the  set  of  values  {£.  }  which  maximizes  Eq.  17  can  be  obtained 
by  simultaneously  setting  the  partial  derivatives  (c.f.,  Eq.  14) 

3YZ(g)  =  0  i  =  1,  2,  .  .  .  p 

c)  a. 

1 

equal  to  zero  and  solving  the  resultant  (in  general  nonlinear)  set  of 
equations  in  the  {a- }  .  This  procedure  is  usually  not  practical. 

When  the  signal  is  sufficiently  strong  however  Y  («)  may  be 
accurately  represented,  in  the  neighborhood  of  a  ,  by  the  first 
few  terms  of  a  Taylors  series  expansion  in  the  }  variables, 
and  approximations,  {a.  },  to  the  }  can  be  obtained  by  maxi¬ 
mizing  this  expansion  with  respect  to  the  {tr  }  .  This  is  the  pro¬ 
cedure  that  is  followed  here.  The  approximate  errors  {6  2.}  =  Uj  •  } 

are  then  expressed  to  first  order  accuracy  in  n(t)  and  the  variances 

CT..  =  E(6  a.  6  a.)  are  determined.  The  derivation  presented  here 
i  j  i  j  (10) 

closely  follows  that  of  Kelly's'  1  with  specialization  of  the  ambiguity 
function  Q (a,  a °  )  (defined  below)  to  the  control  problem. 

10.  ibid. 


I 

i 
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Since 


y(t)  =  K°  s(t;  tt°)  +  n(t) 


Y (2)  (Eq.  14)  can  be  expressed  as 


1  W 

(a)  =  K°  ^  s(t;q°)  f  x(X)  gj(t,  X;a)  dXdt  + 


T  oo 


+  J  n(t)  x{X)  g|(t,  X;a)  dXdt 


By  defining 


I  oo 

Q(«.«°)=  y  s{t  ;a°)  J  x(X)  g^t,  X;gr)  dXdt 


oo  1 

=  S  <T'2  j*  s{t;«°)  0k(t)  dt  J  x(X)  gk(X;a)  dX 
k~  1 


S  <r’2  S  (a°)  s  (a) 

k  =  l  k  k 


1  w 

N(a)  =  y  n{t)  J  x(X)  gj(t,  X;2)  dXdt 


i  oo 

■  J,  <  I  “<*>  4(t>  d‘  I 


x(X)  gk(X;a)  dX 
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00 

=  S 

k=l 


"k  “k^i 


(22) 


Equation  20  can  be  expressed  as 

Y(«)  =  K°  Q(a.  flL°  )  +  N(a)  (23) 

Because  of  the  normalization  introduced  in  Eq.  5 

•  s?(«) 

Q(«.a)=  s  *  =  1  (24) 

k=! 

and  from  the  Schwarz  inequality 

Q(<S>  2°  )  —  1  (25) 

Thus  Q(a,a°  )  attains  its  maximum  value  at  a  =  a°  .  There  may  be 
other  values  of  which  maximize  Q(ff,a°)  .  These  values  correspond 
to  ambiguities  in  the  parameter  estimation  problem.  It  is  assumed 
here  that  either  these  ambiguities  do  not  exist,  or  that  they  are 
resolved  by  other  means. 

Equation  24  guarantees  that  if  the  noise  were  zero  (y(t)  =  Ks(tj) 
for  a  particular  observation  Y 2(a)  would  be  maximized  by  the  set 
a  =  a°  and  the  gain  estimate  would  be  from  Eqs .  9  and  10. 

K  =  Y(a)  =  Y(a°  )  =  K°  (26) 


which  is  also  the  true  value. 

For  the  strong  signal  case  the  estimates  will  be  close  to  the  true 
values  so  that  Y2(a)  can  be  expanded  in  a  Taylor's  series  about  2°  . 
Keeping  terms  only  up  to  the  quadratic  term 
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Y2(a)  m  Y  2(o°)  + 


P  1  P 

2  b.  6  or.  +  4  2 

i  =  l  i,  J  =  1 


c. .  6  at.  6  at. 
»J  1  J 


(27) 


where 


b. 

l 


dot. 

i 


Y  2(g) 


and 


c. . 
ij 


YZ(g) 
da.  J>  a. 


(28) 


(29) 


Setting  the  derivatives  of  Eq.  27  with  respect  to  the  {  }  equal  to 

zero  and  solving,  one  obtains  for  the  deviation  of  the  approximate 
estimate  from  the  true  parameter  value 


a.  -  a  °  =  6«.  =  -  s  (c1)  b  (30) 

i  i  x  j.i  J 

where  (C-1)..  in  the  i,  j  element  of  C"1,  the  inverse  of  C  =  [c^ ]. 

ij  J 

Substituting  Eq.  30  back  into  Eq.  27 

Y ^ (or)  =  Y C[a°)  -  T  2  c..  5a.  6  a.  (31) 

L  i,  j  =  l  lJ  1  J 


The  approximate  error  moments  can  be  found  by  computing  the 
5a.  (Eq.  30)  to  first  order  in  the  noise.  Up  to  first  order  in  noise 
(Eq.  23) 


YZ(q)  =  K°‘ 


Q  2(q,  o°) 


+  2K°  Q{a,  a°)  N(2) 


(32) 


and 


I 


1 
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b.  =  2  K° 


AN(g) 


da; 


0  =  3° 


(33) 


In  obtaining  Eq.  33  use  was  made  of  the  fact  that  Q(«?,  2°)  =  *  and 
that  because  Q(g,  2°)  has  a  maximum  at  a  =  g? 


d  Q(2,  2° ) 

a  a. 


i 


=  0 


Since  b.  has  no  zero  order  term,  from  Eq.  30  it  can  be  seen  that 
it  will  suffice  to  calculate  only  the  zero  order  term  of  (C  )..  .  To 
the  required  accuracy 


c. . 
ij 


rQZ(2,2°) 


da. 


Ta7 

J 


=  2K°2 


<)z  Q(g,  &° ) 

5flfi 


a  =  a° 


=  -2K°Z(M_1).. 

ij 


(34) 


where 


mij 


is  defined  as 


m. .  =  (M). .  = 
IJ  U 


f  Q(2,g°) 

da.  da. 

i  J 


1_ 


a  =  a° 


(35) 


a 


i 


Substituting  Eqs.  33,  34,  and  35  into  Eq.  30  we  obtain  to  the  required 
accuracy 


rv/ 

a. 


-  a 


=  6  a. 


K 


P 

2 

j  =  l 


m. .  . 


N  (a) 


(36) 


Since  Sj  -  a°i  «  a-  -  a °i  Eq.  36  yields  an  approximate  expression 
for  &  depending  on  the  true  value  ar0^  .  The  expected  value  of  the 
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errors  are 


E  (6^)  =  - 


1  P  f  iN(fl) 

K°  j  =  l  ‘J 


a  -  a 


=  .  -L  s  s  <r2 


sk(a) 


K°  j  =  l  1J  k=l 


a=a 


E(nk)  (  =  0 


because  E(n,  )  =  0  .  Thus  the  6 a.  are  unbiased.  The  variances  of 


the  estimates  are 


E(6<z-  63”.  )  - 


1  J  K°  k= 1  i=l 


P  P  f 

S  E  )  m.,  m..  x 


‘ik  “jf 


^N(a) 


k  a  =  *° 


3N(tt) 

2a. 


I  o 

a  =  a 


The  expectation  in  Eq.  3  8  is 


2N  (a) 


i  o 

a  =  a 


3N(g) 


j  ° 

•>  a  -  a 


r  0--2  ^ 

k=l  k  ai  „=„°  i  =  1  “  *=a° 


E(nk  n£  ) 


oo  is  (ft)  <*skte> 

S  cr:  k  -  -Ta — 

k=l  k  ^ai  _„o  *  j 

K  1  a  -  a 


This  last  result  may 
twice  giving 


be  further  simplified  by  differentiating  Eq.  24 
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oo 

2  2 

k=  1 


r 


\  0- 


./i  rLfi!!1 

k  '  <3or. 


i  J 


a=g 


^sk(a) 


<5a. 


$sk(a) 


a 


a=3 


or  =  a 


=  0 


(40) 


Using  this  result  in  Eq.  39 


jjjNjgjj  6N(g) 

„  &a. 

Vi  o  j 

2=2  J 


=  -  s  a 


-2 


o .  s\{a) 


a  -  a° 


S  ff  “  s>°). - £_ 

,  k  k  -  dor.  da. 

-1  i  J 


_ _ o 

a  =  a 


^  Q(g°  ,a) 


<3  or.  3«. 
1  J 


(41) 


a.  =  s. 


Finally,  using  this  last  result  in  Eq.  38,  results  in  a  relatively 
simple  expression  for  the  covariances 


E(6or^  6a.  )  = 


P  P 

2  E  m.,  m. 


Q(g°,g) 


K°2  k=l  i=l  ik 


£  =  « 


P  P 


— ~-y  2  2  m  m.  (M-*).. 
k°^  k=i  i=i  lk  i*  y 


m. . 


K 


ij 


(i.j  =  1,  •  .  .  p) 


(42) 


Thus  the  approximate  variances  and  covariances  associated  with  mea¬ 
suring  the  parameters  {a^}  depend  only  upon  the  second -partial 
derivative  of  Qfe,a°)  evaluated  at  z  =  a°  .  This  result  has  a  simple 
geometrical  interpretation;  the  covariances  depend  only  upon  the 


:Jll5 


'II 


li 


P 


m 

iS*, 

liigglU 
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curvature  of  the  likelihood  function  at  its  maximum.  The  variances 
depend  upon  the  properties  of  the  noise  through  the  normalization 
contant  K°.  When  the  noise  is  white  l/K°^  =  Nq/E  ,  where  E  is 
the  energy  of  the  output  signal. 

Properties  of  the  Q (a,a°)  Function  for  the  Control  Problem 

In  the  control  problem  the  Q(ar,  a°  )  function  (Eq.  21)  is  con¬ 
veniently  expressed  as 

T  oo  oo 

J  *{Xj  )  g(t  -  dXj  J  x(\2  )  g^t,  \2',  ft)  d\2  dt 

O  -  oo  -  oo 

oo  oo 

x(X1)x(X2)  q(Xj,  X2;  a,  a°  )  d\j  dX£  (43) 

-oo  -oo 

where 

T 

q(X1,X2;«,  a°)  =  J  g(t  -  \1;a°)  gj{t,  X2;a)  dt  (44) 

O 

Note  that  q(Xj,  X^;  a,  gt° )  is  zero  when  |Xj[  or  |  X2 [  >  T  .  The 
advantage  of  expressing  Q(»,  a°  )  irr  terms  of  q(Xj ,  X-,;  a,  QL°  )  is 
that  q(Xj,  X2;  a,  )  is  independent  of  the  input  signal  and  makes  it 
possible  to  express  the  derivatives  of  Q  in  terms  of  the  derivatives 
of  q  .  Such  a  procedure  simplifies  the  computations  when  it  is 
desired  to  study  the  variances  of  the  estimates  for  the  different 
types  of  input  signals  occuring  in  control  systems.  If  x(t)  should 
be  a  test  signal  used  solely  for  identification  purposes,  then  Eq.  43 
would  prove  useful  in  a  search  for  a  test  signal  to  minimize  the 
variance  of  the  error. 


Q(gt,a°)  =  J 
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Conclusions 

Maximum  likelihood  estimation  techniques  can  be  applied  to 
the  identification  problem  of  control  engineering,  and  a  practical 
maximum  likelihood  estimator  can  be  synthesized.  For  the  caae 
of  white  noise  the  realization  is  quite  feasible.  In  the  non-white 
noise  case  the  realization  is  complicated  by  two  factors,  the  solu¬ 
tion  of  the  appropriate  integral  equation  leading  to  gj(t,X;a),  and 
the  realization  of  the  estimating  filter  itself. 

Analysis  of  the  large  signal  case  provides  expressions  for 
the  variances  of  the  maximum  likelihood  estimates.  These  calcu¬ 
lations  provide  a  means  for  evaluating  the  maximum  likelihood 
estimation  technique  without  the  necessity  of  actually  building  or 
simulating  the  device.  Also,  since  in  the  white  noise  case  maxi¬ 
mum  likelihood  estimation  is  equivalent  to  the  classical  method 
of  least  squares,  variances  associated  with  a  least  square 
error  type  of  identification  technique. 

As  a  further  application  of  our  analysis  it  should  be  noted 
that  the  control  system  problem  as  formulated  herein  is  the  same 
as  the  radar  problem  where  the  signal  to  be  transmitted  is  not 
known  a  priori  but  only  decided  upon  on  the  basis  of  past  returns. 
Thus  we  have  also  treated  the  adaptive  radar  problem. 
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